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Abstract — Data association, the problem of reasoning over 
correspondence between targets and measurements, is a fun- 
damental problem in tracking tracking. This paper presents 
a graphical model formulation of data association and applies 
an approximate inference method, belief propagation (BP), to 
obtain marginal association probabilities. We prove that BP is 
guaranteed to converge, and bound the number of iterations 
required for convergence. Experiments reveal a favourable com- 
parison to prior methods in terms of accuracy and computational 
complexity. 

Index Terms — Data association, tracking, JPDA, graphical 
models, belief propagation, cycles, convergence guarantees 

I. Introduction 

In recent years, graphical models have emerged as a pow- 
erful tool for inference and learning in large scale systems. 
The promise of graphical models in tracking problems was 
demonstrated in ||T|, 10, jS). The formulation in the former 
focused on sensor networks, in which each sensor had a 
narrow field of view. Non-overlapping regions were defined 
and association variables were instantiated to hypothesise joint 
association events for all targets and measurements within a 
region. Graphical models were studied in a similar application 
in a. 

In the present study, we consider the classical data asso- 
ciation problem, in which a single sensor surveils a large 
number of targets. Each target may give rise to at most one 
measurement, and each measurement is related to at most 
one target. We focus on an approximate solution of a core 
problem in data association: calculating marginal association 
probabilities such as those used in joint probabilistic data 
association (JPDA) |5|, multi-target mixture reduction lO, Q, 
and related methods 111,191. 

Calculation of marginal association probabilities is closely 
related to computation of the permanent of a non-negative 
matrix 1 10], a key problem in the definition of the #P-complete 
complexity class [11 1. Brute force, exact calculation of these 
quantities is intractable for all but the smallest problems. Until 
recently, practical applications of these methods have relied on 
simple heuristics such as those outlined in |12|. Significant 
improvements have been made over the past decade. The 
efficient hypothesis management (EHM) method ifTSl . lfT4ll 
exploits redundancy present in many problems to provide exact 
evaluation with reduced complexity. In many cases, however, 
complexity remains exponential. Monte Carlo Markov chain 



data association (MCMCDA) [15| provides a randomised, 
fully polynomial time approximation scheme (FPTAS) for 
calculating the probabilities, but the computational complexity 
of the method limits its practical use. Subsequent to our initial 
publications |fT6l . lfT71 . a related correlation decay method 
1 18 1 has been proposed, providing a deterministic FPTAS. As 
we show in Section [Vj the computational complexity of this 
approach remains problematic. 

This paper develops a practical approximation approach to 
data association based on belief propagation (BP), demon- 
strating remarkable accuracy, and proving convergence of the 
algorithm, despite the presence of cycles in the graphical 
model formulation. 



A. Contributions 

In this paper, we examine an emerging method for approx- 
imating the marginal association probabilities. Motivated by 
the success of lfT9l . the method was first proposed in |20|, 
and subsequently studied in ||2TI . Il22l . It was developed 
independently and evaluated in the tracking context in our 
preliminary paper y_6J. The contributions of this paper are as 
follows: 

• Proof of convergence of the method for the most common 
case in tracking where the probability of detection is non- 
unity, and the false alarm rate is non-zero. A preliminary 
version of this result was published in ifTTl . A more 
general proof (effectively admitting cases with unity 
probability of detection and/or zero false alarm rate), 
developed in parallel, was announced in ESl . and is 
available in ll24l . 

• Analysis of the computational complexity of the method, 
providing guarantees on the number of iterations required 
as a function of the problem parameters, and interpreta- 
tion of these parameters in the tracking context. 

• A thorough experimental evaluation of the accuracy 
of the approximation in challenging tracking problems, 
and comparison (including computation time) to other 
state-of-the-art methods in the tracking literature. The 
comparison reveals the unique position of the proposed 
approximation in the accuracy vs computation time trade- 
off. 
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B. Outline of paper 

In Section [nj we introduce the problem of data association 
and the particular formulation we utilise (Section II-A i, before 
introducing the formalism of graphical models (Section [II-B| |. 
In Section III we derive our approach, prove convergence, 
bound computation time, and examine practical stopping cri- 



teria. In Section IV we compare and contrast our method to 



two closely related approaches. In Section [Vj we present a 
thorough experimental comparison of our approach to state- 
of-the-art alternatives. 

II. Background 
A. Data association model 

We analyse a variant of the classical model {e.g., 15j) which 
incorporates uncertainty in target existence using the random 
finite set (RFS) formalism of fI5\. The focus of this paper is 
approximate calculation of the marginal association probabil- 
ities; we describe the model for concreteness, and in order to 



provide intuition for how the model parameters (Section III-E i 
impact the number of iterations required for convergence. 
Details of the model and derivation of RFS filters which use 
the present algorithm may be found in |9|. 

Assume that at time t there are rit targets with states Xt — 



{xj , . . . , a;"*} and rat measurements Zt = {z},- 



'}■ 



Each target may give rise to at most one measurement (with 
detection probability Pi{xt)), and each measurement may 
result from at most one target (false alarms occur according 
to a Poisson point process with intensity Afa(zt)). New targets 
arrive at each time according to a Poisson point process 
with intensity An(a;t){^ Target dynamics models (including 
death) are not important to the analysis we wish to perform. 
The measurement likelihood is f{zt\xt). The complete set 
of measurements up to and including time t is denoted as 
Z* — {Zi, . . . ,Zt). Under these assumptions, measurements 
from either false alarms or new targets will be a Poisson point 
process with intensity 



Afan(2;) = Afa(2:) + j f {z\x)X^{x)d: 



\X 



The relationship between targets and measurements is de- 
scribed via a set of latent association variables, comprising: 

1) For each target « e {!,..., rit}, an association variable 
a\ G {0,1,..., mt}, the value of which is an index to 
the measurement with which the target is hypothesised 
to be associated (zero if the target is hypothesised to 
have not been detected) 

2) For each measurement j E {1, . . . ,mt}, an association 
variable bl £ {0, 1, . . . , rif }, the value of which is 
an index to the target with which the measurement is 
hypothesised to be associated (zero if the measurement 
is hypothesised to be either a false alarm or a new target) 

Note that the two sets of association variables are entirely 
redundant: given the information from either set, the other 
can be reconstructed perfectly. As we will see in the following 

'For simplicity, we assume that new targets are guaranteed to be detected; 
the method in | 9 | shows how this assumption can be relaxed. 



sections, this choice of formulation results in an approximate 
algorithm with guaranteed convergence, and remarkable accu- 
racy. 

In essence, JPDA and related methods seek to calculate the 
posterior distribution 

f{xl...,x-^\Z') = 
V p{al ...,aT \Z')f{xl, ...,x7^\al..., Z') 



where Z* — {Zi, . . . , Zt). This is accomplished approximately 
by observing that if the prior distribution is approximated as 

f{xl...,x-*\Z'-') = l[fixl\Z'-') 

i=l 

then, under the model stated, 

f{xl xT \a\,..., aT,Z') = n 

i=l 

Subsequently, by approximating the joint distribution of asso- 
ciation events by the product of its marginals 

p{a],...,a-^\Z')^\[p{a},\Z') 

an approximation of the full posterior can be obtained: 

f{x\, x-^\Z') « \{Y,p{a}t\Z')f{x\Wt. Z') 

Target existence can be incorporated similarly as in ||9l, 1261 , 
or using the related methods ETl . The hypothesis-conditioned 
updated distribution f{x\\a\,Z*) may be calculated using 
simple state estimation methods such as a Kalman filter li28J , 
unscented Kalman filter 1291 or particle filter ll30l depending 
on the model in use. The challenging problem is computation 
of the marginal probability p{al\Z*); this computation is #P 
complete. Under the assumptions stated, the joint distribution 
can be expressed as 

p(ai,..., a'/S 5r|Z*)cx 

nt ( mt I 

n|^,K)nv'^,K,foD| (1) 

where the factors 

/ f « hJ\ = ^ i or H ^ha\^ i 

^i,j{at,K) ^ { . . . (2) 
I 1, otherwise 

collectively ensure that the redundant sets of association 
variables (a^ , . . . , a") and (6^ , . . . , 6™') are consistent, i.e., 
that any event in which the collections are inconsistent has 
zero probability. An example of an inconsistent event is one 
in which the target association variable a\ indicates that target 
i is associated with measurement j, but the measurement 
association variable hi does not indicate that measurement j 
is associated with target i; this implicitly excludes cases in 
which the same measurement is associated with two targets, 
or the same target is associated with two measurements. The 
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factors ipi encode the problem data, with -tpiia]. = 0) = 1 and 
for j > 0, 

XU4) [1 - 4 1 Paixi)fixl\Z^-^)dxl] 

where is the probability of existence of the j-th track prior 
to incorporation of the new measurementj^ We reinforce that 
the details of this model are not the topic of this paper; for 
the present context, the model is a minor variation of previous 
works such as Q, ||2l, [81. The topic of interest is how we 
might tractably estimate the marginal association probabilities 

p{al\Z')= P{al...,a-\bl...,bnz') (4) 

piH\Z')^ J2 p{al...,ar,bl---,br^\Z') (5) 

aj V 'i; bl 

B. Graphical models 

Graphical models ||3T1l . ||32l . 1331 aim to represent and ma- 
nipulate the joint probability distributions of many variables 
efficiently by exploiting factorisation. The Kalman filter [34l| 
and the hidden Markov model (HMM) [35 J are two examples 
of algorithms that exploit sparsity of a particular kind {i.e., 
a Markov chain) to efficiently conduct inference on systems 
involving many random variables. Inference methods based on 
the graphical model framework generalise these algorithms to 
a wider variety of state spaces and dependency structures. 

Graphical model methods have been developed for undi- 
rected graphical models (Markov random fields), directed 
graphical models (Bayes nets) and factor graphs. In this 
work we consider a subclass of pairwise undirected models, 
involving nodes (i.e., random variables) n E Af, and edges 
(i.e., dependencies) e £ £ C Af x Af, and where the joint 
distribution can be written as£] 

It should be immediately clear that ([T]l is in this form. 
As another example, a Markov chain involving variables 
(xi, . . . ,xt) may be formulated by setting -01 (xi) = p{xi) 
for the initial prior, ipt{xt) = 1 for t > 1, and edges 
tpt-i,t{xt-i,xt) = p{xt\xt-i), t e {2,...,r} representing 
the Markov transition kernels, although other formulations are 
possible. 

Optimal inference can be conducted on tree-structured 
graphs using belief propagation (BP). BP proceeds by pass- 
ing messages between neighbouring nodes. We denote by 
fii^j{xj) the message sent from node i E Af to node j G Af 
where G £. The iterative update equations are then: 

(6) 

^We do not consider update equations for this quantity, lience we do not 
require notation for tlie post-update probability of existence. 

^In the general setting, the joint distribution is a product of maximal cliques 
1321 p9]. Since the graph is undirected, we assume that £ is symmetric, i.e., 
if (*> j) S ^ then (j, i) S £. We need only incorporate one of these two 
factors in the distribution. 



For obvious reasons, the algorithm is also known as sum- 
product. If the summations are replaced with maximisation 
operations, then we arrive at max-product BP, which gener- 
ahses the well-known Viterbi algorithm 1361 . providing the 
MAP joint state of all variables in the graph. At convergence 
of sum-product BP, the marginal distribution at a node n can 
be calculated as: 

p{Xn) OC 1pn{Xn) J]^ /^.,^„(x„) (7) 
(n,i)6£ 

In the case of a Markov chain, if all nodes are jointly Gaussian, 
then BP is equivalent to a Kalman smoother. Similarly, if 
all nodes are discrete, then BP is equivalent to inference on 
an HMM using the forwards-backwards algorithm. BP unifies 
these algorithms, and extends them from chains to trees. 

Inference in cyclic graphs (graphs that have cycles, i.e., that 
are not tree-structured) is far more challenging. Conceptually, 
one can always convert an arbitrary cyclic graph to a tree by 
merging nodes (e.g., so-called junction tree representations) 
I3TI . l33l, but in practical problems, the dimensionality of the 
agglomerated variables may be prohibitive. BP may be applied 
to cyclic graphs; practically, this simply involves repeated 
application of (j6]) until convergence occurs (i.e., until the 
maximum error between subsequent messages is less than a 
pre-set threshold). Unfortunately, this is neither guaranteed to 
converge to the right answer, nor to converge at all. Never- 
theless, and perhaps surprisingly, it has exhibited excellent 
empirical performance in many practical problems ll37l . For 
example, the popular iterative turbo decoding algorithm has 
been shown to be an instance of BP applied to a cyclic graph 

The current understanding of BP in cyclic graphs stems 
largely from [39|. It has been shown (e.g., l32, Theorem 3.4]) 
that one can recover exact marginal probabilities from an opti- 
misation of a convex function known as the Gibbs free energy, 
one term of which is the joint entropy of the distribution. 
While this optimisation is intractable, it points to a family of 
variational inference methods that approximate the objective 
function and the feasible set to enable calculation of marginal 
probability estimates without ever manipulating the full joint 
distribution |40|, 132J. It has been shown |(39! that BP (when 
it converges) solves one such approximation, in which the 
objective is approximated by the Bethe free energy (replacing 
the joint entropy with a series of differences of pairwise edge 
entropies and node entropies), and the feasible set (i.e., the set 
of all valid probability distributions) is approximated as the 
distributions for which the pairwise joint distributions along 
edges are consistent with node marginals 1321 4.1.1]. The BP 
message iterates can be viewed as a general iterative method 
for solving a series of fixed point equations derived from 
the optimality conditions of the Bethe free energy variational 
problem ^ 4.1.3]. 

The two approximations made by BP each lead to diffi- 
culties in certain circumstances. Firstly, whereas the Gibbs 
free energy is convex (since entropy is concave), the Bethe 
free energy is, in general, neither concave nor convex. The 
practical behaviour is that errors are often either very small or 
very large, and the general intuition is that failure of convexity 
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leads to large errors; this has led to new families of algorithms 
such as tree re-weighted sum product |41|, which replace 
the Bethe free energy with a convex surrogate. Secondly, the 
approximation of the feasible set may admit points which do 
not correspond to any valid probability distribution, hence the 
solution may be infeasible. Methods seeking to tighten the 
feasible set include ll42l . 

In the case in which the graph is a tree, Bethe free energy 
is equivalent to Gibbs free energy (and hence convex), and the 
feasible set is exact. Not surprisingly, then, BP is optimal on 
trees. In recent years, a number of other cases have emerged 
in which there are guarantees on aspects of the algorithm; this 
paper studies one of these cases. In lfT9l . it was shown that 
max-product LBP can be used to optimally solve 2D assign- 
ment problems in time comparable to the well-known auction 
algorithm. This paper concerns the calculation of marginal 
probabilities on the same model, and utilises the same graph 
formulation. In |24|, it was shown that, in this model, Bethe 
free energy can be parameterised by the doubly stochastic 
matrix of marginal target-measurement probabilities, and is 
convex with respect to this parameterisation. Furthermore, 
the Birkhoff-von Neumann theorem states that any doubly- 
stochastic matrix is a convex combination of permutations |l 24l 
(e.g., different association configurations), hence infeasibility 
is also ruled outj^In the following section, we state the model, 
and derive results relating to convergence and complexity of 
BP in this model. 

III. Belief propagation data association 

In this section, we consider the use of BP to approximate the 
association probabilities in ([T]l. We commence in Section III-A 
by explicitly stating the BP update equations, and then (in 



Section III-B i obtain an equivalent form that reduces com- 



putation complexity from 0{rqrril + rqrrq) per iteration to 
0{ntrat). The reduction is effectively a sum-product version 
of the simplified algorithm in [19 1, and an equivalent form 



was provided in |20]. In Section III-C we use the simplified 



update equations to prove convergence of the algorithm, and 



then, in Section III-D bound the number of iterations required 
for convergence. In Section |III-E| we calculate the expected 
value of the quantity involved in the bound on the number 
of iterations required, providing intuition into the practical 
behaviour of the method. In Section III-F we provide a 



criterion for terminating the BP computation with a guarantee 
of the deviation from the converged solution. 

A. Formulation 

The model we study is that of ([T]). As illustrated in Fig. [T] this 
is a bipartite model in which all target association variables 



G{l,...,nt} 



are connected to all measurement association 



variables (6f )je{i....,mt}- In this case, BP may be implemented 
via two half-iterations, alternating between the two sets of 
messages, {iJ-f^i^i,]) and (/if,3_j.^i)' abbreviating notation, we 
refer to these as {^i^j) and (/ij-yi) respectively. The message 
update equations are: 

'^i.e., there is guaranteed to be a distribution over joint associations that 
yields the marginal probabilities that BP obtains. 




Fig. 1. Graphical model formulation employed for data association. The value 
of the random variable is the index of the measurement with which target 
i is hypothesised to be associated, while the value of the random variable 
is the index of the target with which measurement j is hypothesised to be 
associated. 



. (9) 



(10) 



(11) 



Naively, the complexity per iteration of this procedure is 
0{n^m'^ + n^m'l), since each iteration involves sending a 
message from each of the rit target association variables to 
each of the rrit measurement association variables (and vice 
versa), and each message involves {nit + 1) (respectively 
{rit + 1)) values, each of which requires 0{n^) (respectively 
0[m'^)) calculations. 



B. Simplified algorithm 

As shown in |fT9l , EOl . ifTTl . these computations can be 
dramatically simplified by observing that in (j9]l (and ([TT)), 
while the message consists of [rit + 1) (respectively {rat + 1)) 
values there are only two distinct values {i.e., = i and i 
in (j9|, and aj = j and aj 7^ j in ( 1 1 1). Further, since we are 



free to renormalise messages, we may divide by one of these 
two values {^i^j{bl ^ i) and ^j^i{a\ 7^ j) respectively) to 
obtain a scalar representation of the message. The message 
update equations in terms of these scalars become: 



i + EjVjj'>o^»0'')m/ 
1 



1 + Ei'^j,i'>o Mi'^j 



(12) 
(13) 



where we have exploited the choice that '0i(«t = 0) = 1 to 
reinforce that the denominator is greater than zero. Upon con- 
vergence, the approximate marginal association probabilities 
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can be obtained by 



Proof: First note that 



(14) 
(15) 



where /io^i — 1 and /lo-s-j — 1- An 0{ntmt) (per iteration) 
Matlab implementation of these update equations was given 
in mil. 



C. Proof of convergence 

Most previous work on convergence of BP (such as P3l ) 
requires the factors of the graph to satisfy conditions on dy- 
namic range {e.g., ma.Xafiipi.j{a,b)/ miuafiipi.jiajb)). These 
methods do not apply to the present case since the factor 
ipij{a,b) has infinite dynamic range. Nevertheless, we show 
here that it is possible to establish that the simplified expres- 



sions in Eqs. ( 12 1 and ( 13 1 are contractions, thus guaranteeing 
convergence of the method. The property we exploit in the 
proof is the same as the correlation decay property utilised in 
the recent work |18|. 

A function / : A" — > A" operating in a metric space X with 
distance metric d : A" x A" — > M>o is a contraction if there is a 
contraction factor a < 1 such that d{f{x), f{y)) < ad{x, y) V 
x,y G X II44I 9.22]. If / is a contraction on a complete metric 
space, then any sequence resulting from repeated application 
of / will converge to the same fixed point Ii44. 9.23]. Following 
ll43l . we use the distance metric 



d{n,fi) 



max 

i.j 



(16) 



where we use the convention 2 = l|^ To show that this 
satisfies the triangle inequality, note that 



— max 


log 


















< max 


log 




+ max 


log ^; - 












= d{p,,v) - 


Yd{u,jj.) 





Other properties of a distance metric are trivially satisfied. 



Let /jb = f{u) be the update defined in (12i in vec- 
tor form with i> = ....«*} je{i,...,mt}, M = 
(/^wj)ie{i,...,«t}je{i,...,mt}, and v = g{p) be the update 



defined in ( 13 1. We now prove a preliminary result relating to 
the form of the contraction factor that we use. 

Lemma 1. For L > 1 and c > 0, the function 

log (^) 



a{L, c) = 



logL 



(17) 



is strictly less than one, and is monotonically increasing in L. 

'in our problem setup, elements will either be bounded below by a strictly 
positive constant, or zero at all points in the sequence (when ipiQ) = 0), 
hence we will never measure the distance between a zero element and a non- 
zero element (which would be infinite). 



This gives the first result. For the second result, it suffices to 
show that log a(L,c) > 0, i.e., that 

' + ^'^^ogfi±^)<^LlogL 
^ V 1 + c / " 1 + c ^ 

This result is an immediate consequence of convexity of 
xlogx, since ^jt^ = TTS^ + TTS^' " 
The following two lemmas establish that the BP updates in 
Eqs. (12 1 and ( [T3] l are contractions. 

Lemma 2. For all {v^u) with diy,!/) < logL, the message 
update /(•) is a contraction with respect to the distance metric 
d{-, ■) with factor a{L, W^,), where 

^ — ^ i 

aj>0 

Proof Assume that, V < ^j^i < 1 and < 

Aj^i < 1; this is not restrictive as it is guaranteed to be 



satisfied for any {u, u) resulting from ( 13 1. Let 



L = exp diy, £>) < L < 00 

Then iij^i < Ljlj^i and jlj^i < L^j^i. If ipi{j) — then 

f-hji^^) = f-hjit^) = 0' ™d fi,j{'^)/fi.j{'^) - 1- Otherwise, 
consider the quotient 

ft,j{l^) ^ l + Ej'^j,3'>oV'»(/)Mj'^i 



< 



l + cL 1 + W^L 

i + c - i + w^ 



where c = J2j'^j,j'>o < < W^., and the final 

step uses the second result of Lemma [T] Following similar 
steps. 

After combining these two cases, taking logs and observing 
that they apply for all we are done. ■ 

We now show the same result for the equation in the 
alternative step, ( [T3| ). 

Lemma 3. For all (/x, fi) with p,) < logX, the message 
update g{-) is a contraction with respect to the distance metric 
d{-,-) with factor a{L,W*), where 



E 

i>0 



maxM^-* 
j 



Proof: Assume that, V < fii^j < and 

< jj-i-^j < i^iii), and that /i^^j = or jli^j = if and only 
if "01 (j) = (in which case fii^j/fii^j ^ 1). Again, this is 
not restrictive as it is guaranteed to be satisfied for any (/x, /i) 



6 



resulting from (12i. Consequently, J2i f-'-i^j 



(and similarly for /i). Let 

L = Gxp d{fi, fi) < L < oo 
Then, following the same steps as in the previous proof. 
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(m) _ 1 + 



^ l + W*L ^ 

- i + w* - 



Since this is true for all we take logs, combine cases, 

and obtain the desired result. ■ 

Theorem 1. The loopy BP fixed point iteration /(•), g{-) 
converges to the same stationary point regardless of the 
initialisation. 

We omit the proof of the theorem since it is a straight- 
forward application of contraction mapping results; see l! 44l 
Sect 9.23], and note that the message iterates are contained in 
a compact subset of the space (i.e., with elements either being 
zero for all k, or being bounded below and above by strictly 
positive finite constants) hence establishing completeness. 

D. Bound on complexity 

Having proven convergence of the algorithm, we now switch to 
analysing the computational complexity involved in attaining 
convergence. Since we are ultimately interested in the marginal 
probability estimates, we start with a lemma that relates the 
deviation in the BP messages from their final values to the 
deviation in the marginal probability estimates (we use the 
term deviation to describe the difference between the current 
iterate and the converged solution). For convenience we define 
the shorthand p^{i) == p{a\ ~ j\Z*). 



Lemma 4. // 



log 



H- 



< e V and p and p are 

calculated from /i and jl via (14\, then \p^{j) —p^{j)\ < '5(e) 
V (i,i) where 5{e) 



exp(2e) — 1 w 2e (for small e). 
Proof: We seek to bound 



< 



- 1 



(18) 



Trivially, exp(— e) < < exp(e). The second term can be 



bounded similarly since 

J2j' fij'^zMf) ^ Ej' exp(e)/Xjv^,'(/;,(j') 



Ej' fj'j'^ii^iij') Ej' fJ'j'-^tMJ') 

Substituting these bounds into ( fTsj l we find 



exp(e) 



exp(-2e) - 1 < 



1 < exp(2e) - 1 



The desired result is then obtained by observing that < 
1 — cxp(— 2e) < exp(2e) — 1. Finally, note that exp(2e) — 1 = 
2e + O(e^) for small e, so that the error is well-approximated 
as 2e. m 
Using the previous lemma, we now show that the number 
of iterations required for the deviation to be less than a desired 
level is bounded through a simple closed-form expression. In 



what follows, we denote by /i 



and 



of the messages from ( 12 1 and (13 1, and by fj,i^j.* and fij 



the fc-th iterate 



the messages upon convergence. 

Theorem 2. Starting from iJ.j^i_o — 1 ond given W^, and 
W* , deviation between the marginal probabilities and the 
approximations obtained at convergence is guaranteed to be 
no more than e ;/ the number of iterations k satisfies 

log[exp(2£)-l]~loglog(l + iy,) 
- loga(l + W^*,T4^*)+loga(l + W^*,M^*) 
Looser bounds that only consider the contractions correspond- 
ing to one half-message also apply: 

log[exp(2e) - 1] - loglog(l + W*) 



1 > 



k > 



loga(l 
log[exp(2e) — 1] 



- loglog(l + W*) 



(19) 



(20) 



loga(l + W^*,M^*) 
Proof: First, we bound the distance between the messages 
at initialisation. The point which we choose to bound is the 
message ^i^jfi which results from a single application of (12 1 
from the initialisation fij^i.o = 1. We examine the ratio 

l + E/^,,.,-'>o^i(/)A*j'^^* 



1 



It is clear that any fij^i 
li-j^i < 1- Consequently, 
1 

< 



resulting from ( [T3] l will satisfy < 



1 



1 + VK* - 1 4 
Thus, after half an iteration. 



log 



, <log(l+W,) 

Thus we set Z = 1 + VF*, to obtain a contraction factor 
a = a(l + W»,W^)a{l + W*,W*) (combining the two 
half-iterations). After one more half-iteration (neglecting the 
contraction that this introduces for convenience), we obtain: 



log 



< log(l + 



After (fc — 1) subsequent iterations, the deviation will be 
bounded by 



log 



< a'^-Mog(l 



In order to obtain the desired bound on the marginal proba- 
bility deviation, we seek k such that 

a'=~Mog(l + W,)< exp(2e) - 1 

{k - 1) loga < log[exp(2e) - 1] - loglog(l + W^,) 
log[ exp(2£) - 1] - loglog(l + W,) 
loga 



1 > 
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Fig. 2. Bounds on the number of iterations required for convergence (i.e., to 
guarantee that the resulting marginal estimates are within 10^'^ of their final 
values). The blue line shows the closed-form bound of |l9), while the green 
line shows the tighter bound computed using Corollary [11 



The first loosened bound is obtained by observing that a < 
a(l + VF*, W*), i.e., only incorporating the contraction factor 
for one of the two half-iterations. The second is obtained by 
observing that 

1 



1 + W* 



< 1 



and following similar subsequent steps. ■ 
The analysis in Theorem |2] can be tightened via a bound 
that is computed numerically. As iterations progress, we are 
guaranteed to be closer to the optimal solution. Correspond- 
ingly, the contraction factor reduces, hastening convergence. 
The closed-form analysis does not exploit this fact, rather 
assuming that the contraction factor remains at its original, 
worst-case value. The tighter, computable bound is described 
in Corollary [T] 

Corollary 1. Assume that we commence from fij^i^ — 1, so 
that 



log 



<log(l + W^*) = Li 



Then for k > 1 



where 



log 



a(expLfc_i, 



Fig. |2] shows the number of iterations k required to guaran- 
tee deviation S = 10~^ in the marginal estimates for various 
W^, ranging in [0, 1000]. The green line shows the computable 
bound from Corollary [T] while the blue line shows the closed- 
form bound from Theorem |2] (i.e., the first loosened bound, 
which depends only on W^). The computable bound is well- 
approximated as being log-log-linear, while the closed-form 
bound is well-approximated as being log-linear 



E. Interpretation of complexity 

The previous section showed that the number of iterations 
required depends on the parameters — max^ Wi and/or 
W* = maxj . In this section, we evaluate the expected 
value Ez\x[Wi] when the true target positions are X = 
{x^, . . . , x"} in order to provide intuition on how this param- 
eter relates to the problem parameters. Modifying notation to 
explicitly incorporate the measurement set Z, 

zez 

Simplifying the model to a uniform probability of detection, 
false alarm density and new target density, the weights ipi{z) 
become 



Afa„(l - qlPd) 



f{z\xl)f{xl\Z'-')dxi 



The sum in Wi is over all measurements; accordingly its 
expected value can be calculated using the first moment of 
the measurement distribution ||451 . ||251 to obtain 



Ez\xm] 



Afa„(l - qlPd) 



(21) 



A(z|X)r(z|Z*-i)dz (22) 



where /*(z|Z*^^) is the distribution of the measurement 
for the i-th track, and X{z\X) is the first moment of the 
measurement distribution given the true multi-target state X: 



f{z\xl)f{xl\Z''')dxl 



Xiz\X) = Xf,,, + yPdf{z\x') 



(23) 



(24) 



Accordingly, we interpret (22 1 as being the expected measure- 
ment intensity in the vicinity of the predicted measurement 
distribution for the track i. This provides useful intuition on 
the dependence on various problem parameters: 

• Wi decreases as the false alarm intensity increases. The 
reduction is (approximately) linear when the false alarm 
intensity is a small contributor to the overall measurement 
intensity. When the false alarm intensity is the dominant 
contributor to the intensity, little reduction will occur 

• Wi decreases as the probability of existence and detection 
decreases. This occurs both due to the leading factor, 
and due to the reduction in the overall measurement 
intensity. As the probability of existence and detection 
approaches unity, the quantity increases rapidly, due to 
the term qlPd/{l — qlPd) in the leading factor 

• Wi increases as targets become closely spaced, as mul- 
tiple targets contribute to the measurement intensity in 
the region for the track of interest. Highly accurate 
measurements and dynamic models will reduce Wi if the 
additional accuracy permits targets to be distinguished, 
i.e., it reduces the contribution of other targets. If the 
spacing of the targets is such that the additional accuracy 
does not separate the targets, an increase in accuracy will 
increase Wi, increasing convergence time. 
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In summary, if additional SNR permits disambiguation of 
measurement-target association then it will aid convergence; 
in other cases, lower SNR cases will yield better convergence. 
Empirically, we will see in Section |V] that the marginal 
probability estimates also become closer to their exact values 
in lower SNR conditions. A similar analysis will show that the 
quantities W* and depend on the intensity of tracks in the 
vicinity of measurements, although the quantities involved are 
not as easily interpreted. 

F. Stopping criterion 

The emphasis of the previous sections has been on bounding 
the number of iterations required to obtain an error less than a 
particular threshold. Practically, this will lead to an algorithm 
that performs a pre-determined number of iterations. The 
more common approach in LBP is to compare messages to 
previous iterates and terminate when convergence is detected. 
The following theorem, which is a well-known application of 
contraction mapping results, provides a stopping criterion with 
a guaranteed bound on error. 

Theorem 3. If d{Hf., fj,i^_i) < e then 

a{e,W,)a{£,W*) 



l~a{e,W,)a{e,W*) 



Proof: Let /Xj, denote the vector form of the messages 
at iteration k and a = a{e,W*)a{e,W*). Then ^ k' > k, 
d{fj,f.,, fii^,_^) < a^~^e. By repeated application of the 
triangle inequality. 



E 

k'=k+l 



a 



Require: Number of tracks nt, number of measurements 
nit, single-target association weights V i G 

{1, . . . ,nt}, j £ {1,. .. ,mt} (assumes V'j(O) — 1), conver- 
gence criterion S, number of iterations between convergence 
checks N 

Ensure: Marginal probability estimates pij V i S 
{l,...,nj, j e {0,...,mt} 
7] := i log(l + S) {For convergence criterion} 

:= ma,XiJ2j>o'^i{j) {For convergence criterion} 
^ij^i :— l\f i, j > 
repeat 

{Perform N iterations without checking convergence} 
for fc := 1 to TV do 

for J := 1 to ^4 do {Calculate L-R messages} 

end for 

if fc = iV then {For convergence check} 

flj^i fij^i y i,j > 
end if 

for j := 1 to mt do {Calculate R-L messages} 

s := 1 + J2i>a 
Hj^i := l/[s - fi.^j] V i 
end for 
end for 

{Check for convergence} 



d := 



until 



: (log™) /(log d) 



{Calculate marginal estimates} 
for i 1 to rit do 

s l + E,>o^'0')Mi^': 



Pifi := l/s 
end for 



^/sVj >0 



G. Algorithm 

The BP algorithm is summarised in Fig. |3] The algorithm 
incorporates the results of Lemma |4] and Theorem |3] in order 
to test convergence. Convergence checks are performed every 
N iterations in order to avoid the computational overhead 
involved. In practice, we suggest values in the range N E 
{5,..., 20}. 

IV. Relationship to other methods 

A. Junction tree 

The junction tree algorithm f46l, f33l is the standard method 
for conducting exact inference in a cyclic graph. The algorithm 
provides a systematic procedure for merging variables into 
hyper-nodes in order to convert the cyclic graph into a tree, 
and then executes BP on that tree to conduct exact inference. 
As previously described, the complexity of the algorithm can 
be problematic, as the computation increases exponentially in 

^loosely speaking; more correctly, the method forms a tree of hyper-nodes 
that satisfies the running intersection property, thus ensuring that enforcing 
consistency along edges is sufficient. For further details, see 1331 . 



Fig. 3. Algorithm for computing approximate mai'ginal probabilities using 
BP. 



the number of nodes that need to be merged together in order 
to yield a tree-structured graph. 

Junction trees can be applied to data association using the 
graph described in Fig. [T] but this would be quite inefficient, 
since the redundant use of target association variables and 
measurement association variables is of no benefit in the exact 
case, and it increases the number of variables over which 
inference must be conducted. Instead, we use the simpler, yet 
equivalent representation of 



p(ai,...,a;''|Z*)cx 



n ^cK'«t') 



(25) 
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Fig. 4. Example of graphical formulation used as an input to the junction 
tree algorithm. In this case, there is no measurement with non-zero weight for 
both target 1 and target 4, hence there is no edge between the corresponding 
nodes. 



where 

£ 



a\ = a\ > 



(26) 



I 1, otherwise 

{(z,i')|3 i e {1,. . .,nt},i' S {1,. . . ,nt}, 

j e {1, . . .,mt} s.t. i ^ i', Mj)^^'U) > 0} 

(27) 

We assume that the node potentials tpiU) have been thresh- 
olded such that they are zero for infeasible associations. This in 
turn creates the sparsity in £ which is exploited by the junction 
tree algorithm. While BP could also be applied directly to the 
cyclic graph in this formulation (as proposed in ll47l . Il33l 
Box 12. D]), it was shown in [T6l that it performs poorly (in 
both accuracy and convergence) in comparison to the bipartite 
formulation in ([TJ. An example of this graph is show in Fig.|4] 
The efficient hypothesis management method (EHM-2) 
|fT3l . lfT4l exploits a similar tree-based inference, gaining 
additional efficiency by effectively reducing the alphabet 
within the junction tree hyper-nodes (i.e., eliminating elements 
of the hyper-node alphabet which violate mutual exclusion 
constraints). 

The experiments in Section |V] compare the accuracy and 
computation time of the method proposed in this paper to 
the junction tree algorithm with various thresholds applied to 
yielding various levels of sparsity in the resulting graph. 
The experiments were conducted using the implementation of 
junction tree contained in the library for discrete approximate 
inference l|48l . 

B. Correlation decay 

The recently proposed method of fTSl utilises the correlation 
decay property of statistical physics to obtain a deterministic 
FPTAS for the marginal association probabilities. The algo- 
rithm involves a recursion of the form 

1 



HT,M,j,t) 



(28) 



^ + J:^eTMjMT,M-{J},^,t-l) 



where T (respectively, A^) is the set of tracks (respectively, 
measurements) remaining in the recursion, and t is the maxi- 
mum number of recursion steps to perform. Close inspection 
reveals that these equations are almost identical to ( [T2| and 
( [T3| ); the difference is that the recursion in ( |28] l and (|29|l never 
revisits nodes in its recursion; consequently it is exact, but has 
exponential complexity. 

The theoretical analysis in fTS^ shows that if the number of 
nodes in the graph grows, but the maximum connectivity and 
maximum single-target association weight remains constant, t 
may be chosen such that the error is bounded yet complexity 
is polynomial in the number of tracks and measurements. The 
proof of this theorem, which is based on the earlier work 
1491 , exploits similar properties to the convergence proof in 
Section Hira 

The experiments in Section [V] compare the accuracy and 
computation time of the correlation decay algorithm to the 
method proposed in this paper 

V. Experiments 

We consider a series of single time-step problems involving 
targets on a regularly spaced grid. Although the experiment 
involves only a single time step, the track covariances are 
preinitialised by simulating 30 time steps of the standard 
constant velocity model: 



t\t-i 
Pt\t 
K 

with 

F = I2x2 ' 



FPt- 



P, 



t|t-ii 



Pi 



t\t-i 



KHP, 



Ptit^iH^{HPtit_,H^ 

1 




target not detected 
target detected 



T 
1 



Q = ql2x2 • 



T3/3 rV2 
T 



(29) 



T = I, q = 0.01, H = /2X2 «) [ 1 ], and i? = r/2x2, 
commencing with Po|o equal to zero. The track estimates were 
initialised to the true target positions, corrupted with additive 
Gaussian noise with covariance distributed according to the 
prior covariance for the target, generated through independent 
simulation of each target. Gating was performed with a thresh- 
old such that the probability of excluding the target-derived 
measurement was lO^"*. The area populated with false alarms 
was sufficiently large to cover these gates. 
Three groups of experiments were conducted: 

1) Six targets (arranged in a regular 2x3 grid), varying 
target spacing between and 10 units, considering cases 
with Pd G {0.3, 0.6, 0.9}, Afa G {0.01, 0.0316, 0.1}, and 
r G {0.1,1,10} 

2) Targets in a n x 3 regular grid (with n G {2, . . . , 30}) 
with target spacing set to 3 units, P^ ~ 0.6, Afa — 0.01 
and r = 1 

3) Targets in a n x n regular grid (with n G {2, . . . , 10}) 
with target spacing set to 3 units, Pd ~ 0.6, Afa = 0.01 
and r = 1 

A number of example experiments are shown in Fig. |5] 

For each parameter (spacing, Pd, etc), 1000 single time step 
Monte Carlo trials were performed. The total number of Monte 
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Fig. 5. Examples of single Monte Carlo trials. True target positions are shown as , measurements (false alarms and true measurements) are shown as X, 
and prior track estimates are shown as +. Feint grey lines show the edges in the graph, connecting track estimates and measurements which form feasible 
associations. 



Carlo trials performed for each algorithm was 395,000. The 
experiments were performed using a dual processor Intel Xeon 
E5-2670 server with Matlab Parallel Computing Toolbox, 
utilising 12 worker threads. 

To evaluate the various algorithms compared, we calculated 
the average maximum error in the marginal probabilities 



calculated by each algorithm, and the average computation 
time. The comparison is performed on the basis of the accuracy 
of the marginal estimates in order to exclude down-stream 
effects such as coalescence, mixture reduction, etc. While 
these effects are significant, they are separate issues to the ap- 
proximation of the marginal probabilities, and any conclusion 
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reached in an experiment including these effects is specific to 
the full gamut of approximations made in the tracking system. 
We define the average maximum error as being the average 
value of the largest element of the difference between the 
marginal distribution of a target association variable estimated 
by the algorithm under test, and the corresponding reference 
value, thus averaging is performed over both Monte Carlo 
trials and over targets in those trials. In the first experiment, the 
reference value is the exact marginal distribution, calculated 
using the junction tree algorithm with a threshold of 10^^ 
(which is applied to the weights utilised in all algorithms). 
The size of the problems in the second and third experiments 
prevented exact computation, thus the reference values were 
calculated using MCMCDA with 10^ samples. A second 
instance of this algorithm was included as a test, providing 
an estimate of the expected error in the reference values. 

A. Comparison algorithms 

The algorithms compared in the experiments were: 

• The BP algorithm detailed in Fig. |3] A vectorised, 
Matlab-based implementation was utilised. 

• The junction tree algorithm described in Section |IV-A| 
with weights thresholded to 10^'^, 10^^ and 10^^, in- 
ducing varying degrees of sparsity, and thus exploring 
the accuracy versus computation time trade-off. The algo- 
rithm was implemented using the general-purpose libDAI 
system |48 |. 

. The MCMCDA algorithm of |15J, using 10^, 10*^ and 
10^ MCMC steps. The algorithm was initialised with 
the MAP association (calculated via an auction) to avoid 
the need for burn-in. The implementation was written in 
Matlab and compiled into C-n- using the Matlab Coder. 

• The correlation decay algorithm described in Sec- 
tion IV-B using t G {3, 5, 7}. The implementation was 



written in C++. 

• The linear multitarget integrated existence PDA 
(LMIPDA) algorithm of 1501. The track existence 
probabilities were set to unity for comparison to 
non-IPDA methods. A vectorised, Matlab-based 
implementation was utilised. 

• The approximate Bakhtiar-Alavi-Amoozegar (BAA) al- 
gorithm from ifSn . which was shown to be the approx- 
imation of choice in fV2\. A vectorised, Matlab-based 
implementation was utilised. 

The preliminary version of this paper lfT6l included results 
for PDA {i.e., ignoring the presence of adjacent targets), and 
BP applied directly to the cyclic graph in the alternative 
formulation in Fig. |4] (as discussed in Section IV-A i. These 
were excluded from this comparison as the errors they com- 
mitted were considerably worse than the algorithms we are 
comparing. 

B. Results and discussion 

The results of the comparison are shown in Fig. [6] The 
rows of plots alternate between average marginal error, and 
average computation time (per simulation). Plots with shaded 



backgrounds correspond to the second and third experiments 
{i.e., varying numbers of targets), while the remaining plots 
correspond to the six target experiments. The a;-axis in the 
six-target experiments shows the spacing of the true positions 
of the targets in the regular grid, varying between and 10 
units. The first two rows of plots show results varying the false 
alarm rate with Afe G {0.01,0.0316,0.1}, with Pd = 0.6 and 
r = 1. The first case ~ 0.01 is the baseline, to which 
the following cases compare. The second two rows show the 
effect of lowering or raising the Pd, i.e., Pd G {0.3,0.9}. 
The final two rows show the effect of raising or lowering 
the measurement noise, i.e., r G {0.1, 10}. The top two plots 
in the shaded box show the second experiment (with 3 x n 
targets varying n G {2, . . . , 30}, where the x-axis shows the 
total number of targets), and the third experiment (with nx n 
targets varying n G {2. 10}, again with the x-axis showing the 
total number of targets). In each of these latter cases, the target 
spacing is fixed to 3 units, and baseline parameters apply. 

We make the following observations: 

• BP (blue) exhibits excellent performance, with average 
errors of 0.015 or less in most cases, except for the high 
Pd case {Pd = 0.9) and the low measurement noise case 
(r — 0.1); in these two cases, the errors are in the 0.02 — 
0.04 range. Note that these two cases correspond to high 
SNR. The large scale problems (3 x n and n x n) show 
little or no indication that the accuracy of BP deteriorates 
as the number of targets increases. 

• BP (blue) significantly outperforms LMIPDA (magenta) 
and BAA (cyan), exhibiting errors reduced by a factor of 
ten in most cases. The computational complexity of BP 
is only a few times that of LMIPDA and BAA, averaging 
fractions of a millisecond in all of the six-target problems, 
and a little over a millisecond even in the 90 and 100 
target problems. 

• LMIPDA (magenta) and BAA (cyan) appear perform 
similarly to each other across the board. BAA performs 
significantly better in high false alarm and high mea- 
surement noise cases, while LMIPDA performs better in 
low measurement noise cases. The computation time is 
comparable. 

• Junction tree (black) exhibits excellent performance with 
smaller thresholds, but at a high computational cost. With 
the threshold set to 0.1 (black dash-dotted), the error 
is globally worse than BP, and often much worse. The 
computational complexity in this case is 1-2 orders of 
magnitude higher than BP, and the 9x9 and 10 x 10 
experiments were unable to be completed due to the 
high memory requirements of the algorithm. With the 
threshold set to 0.01 or 0.001, the accuracy of junction 
tree is generally much better than BP, but the computation 
time is 2-4 orders of magnitude higher than BP. Most 
experiments in the 3x n and nx n cases were unable to 
be completed due to excessive memory requirements. 

• The performance of MCMCDA (green) with 10^ itera- 
tions (dot-dashed) is generally comparable to BP in the 
six-target cases, but it performs progressively worse in 
cases with larger number of targets. In the larger cases. 
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Fig. 6. Results of experiments. First, third and fifth rows of plots show the average worst-case error in the marginal probabilities, averaged over targets 
and Monte Carlo trials. Second, fourth and sixth rows show the average computation tiine. Unshaded plots show results of the first experiment (involving 
six targets on a regular grid with spacing varied on the x-axis), while shaded plots show the second and third experiments (with 3 X n and nxn targets 
and respectively). Colours show algorithms as BP, LMIPDA, riAA, MCMCDA (with dot-dashed, dashed and solid representing 10''', 10® and 10^ iterations), 
correlation decay (with dot-dashed, dashed and soUd representing 3, 5 and 7 iterations) and junction tree (with dot-dashed, dashed and solid representing 
thi-esholds of 10"\ 10"^ ^nd 10"^). 



BP performs similar to or better than MCMCDA with 10^ 
iterations. The complexity of MCMCDA is 2-4 orders of 
magnitude higher than BP depending on the number of 
samples used. 



• BP generally outperforms the correlation decay method 
(red) with < = 3 (dot-dashed) and t = 5 (dashed). 
With t = 7, BP generally performs better for tighter 
target spacings, and correlation decay performs better 
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with larger target spacings. The correlation decay method 
performs particularly poorly in the case with Pd = 0.9. 
The computational complexity of the correlation decay 
method varies greatly through the experiments. In some 
cases (e.g., r — 0.1) it is slightly faster than BP. In many 
other cases its complexity is 2-3 orders of magnitude 
slower, and in the large scale problems, it is 4-5 orders 
of magnitude slower. 

VI. Conclusion 

This paper has introduced a new approximate method for 
solving data association problems using BP on a particu- 
lar graphical model formulation. While BP is generally not 
guaranteed to converge on cyclic graphs, we have proven 
convergence on the formulation studied, and bounded the 
computation required to attain convergence. While the method 
is approximate, the experiments in Section [V] reveal an highly 
favourable comparison with state-of-the-art methods in the 
accuracy versus computation time trade-off. 
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